190 research outputs found
From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips
Short internet video clips like vines present a significantly wild
distribution compared to traditional video datasets. In this paper, we focus on
the problem of unsupervised action classification in wild vines using
traditional labeled datasets. To this end, we use a data augmentation based
simple domain adaptation strategy. We utilise semantic word2vec space as a
common subspace to embed video features from both, labeled source domain and
unlablled target domain. Our method incrementally augments the labeled source
with target samples and iteratively modifies the embedding function to bring
the source and target distributions together. Additionally, we utilise a
multi-modal representation that incorporates noisy semantic information
available in form of hash-tags. We show the effectiveness of this simple
adaptation technique on a test set of vines and achieve notable improvements in
performance.Comment: 9 pages, GCPR, 201
Improving the Accuracy of Action Classification Using View-Dependent Context Information
Proceedings of: 6th International Conference, HAIS 2011, Wroclaw,
Poland, May 23-25, 2011This paper presents a human action recognition system that decomposes the task in two subtasks. First, a view-independent classifier, shared between the multiple views to analyze, is applied to obtain an initial guess of the posterior distribution of the performed action. Then, this posterior distribution is combined with view based knowledge to improve the action classification. This allows to reuse the view-independent component when a new view has to be analyzed, needing to only specify the view dependent knowledge. An example of the application of the system into an smart home domain is discussed.This work was supported in part by Projects CICYT TIN2008-06742-C02-02/
TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/
TIC-1485) and DPS2008-07029-C02-02.Publicad
Survey on Vision-based Path Prediction
Path prediction is a fundamental task for estimating how pedestrians or
vehicles are going to move in a scene. Because path prediction as a task of
computer vision uses video as input, various information used for prediction,
such as the environment surrounding the target and the internal state of the
target, need to be estimated from the video in addition to predicting paths.
Many prediction approaches that include understanding the environment and the
internal state have been proposed. In this survey, we systematically summarize
methods of path prediction that take video as input and and extract features
from the video. Moreover, we introduce datasets used to evaluate path
prediction methods quantitatively.Comment: DAPI 201
Wave Functions, Quantum Diffusion, and Scaling Exponents in Golden-Mean Quasiperiodic Tilings
We study the properties of wave functions and the wave-packet dynamics in
quasiperiodic tight-binding models in one, two, and three dimensions. The atoms
in the one-dimensional quasiperiodic chains are coupled by weak and strong
bonds aligned according to the Fibonacci sequence. The associated d-dimensional
quasiperiodic tilings are constructed from the direct product of d such chains,
which yields either the hypercubic tiling or the labyrinth tiling. This
approach allows us to consider rather large systems numerically. We show that
the wave functions of the system are multifractal and that their properties can
be related to the structure of the system in the regime of strong quasiperiodic
modulation by a renormalization group (RG) approach. We also study the dynamics
of wave packets to get information about the electronic transport properties.
In particular, we investigate the scaling behaviour of the return probability
of the wave packet with time. Applying again the RG approach we show that in
the regime of strong quasiperiodic modulation the return probability is
governed by the underlying quasiperiodic structure. Further, we also discuss
lower bounds for the scaling exponent of the width of the wave packet and
propose a modified lower bound for the absolute continuous regime.Comment: 25 pages, 13 figure
Fusion of Single View Soft k-NN Classifiers for Multicamera Human Action Recognition
Proceedings of: 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2010). San Sebastián, Spain, June 23-25, 2010This paper presents two different classifier fusion algorithms applied in the domain of Human Action Recognition from video. A set of cameras observes a person performing an action from a predefined set. For each camera view a 2D descriptor is computed and a posterior on the performed activity is obtained using a soft classifier. These posteriors are combined using voting and a bayesian network to obtain a single belief measure to use for the final decision on the performed action. Experiments are conducted with different low level frame descriptors on the IXMAS dataset, achieving results comparable to state of the art 3D proposals, but only performing 2D processing.This work was supported in part by Projects CICYT
TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM
CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02Publicad
An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010
Abstract. This paper summarizes results of the 1st Contest on Seman-tic Description of Human Activities (SDHA), in conjunction with ICPR 2010. SDHA 2010 consists of three types of challenges, High-level Human Interaction Recognition Challenge, Aerial View Activity Classification Challenge, and Wide-Area Activity Search and Recognition Challenge. The challenges are designed to encourage participants to test existing methodologies and develop new approaches for complex human activity recognition scenarios in realistic environments. We introduce three new public datasets through these challenges, and discuss results of state-of-the-art activity recognition systems designed and implemented by the contestants. A methodology using a spatio-temporal voting [19] success-fully classified segmented videos in the UT-Interaction datasets, but had a difficulty correctly localizing activities from continuous videos. Both the method using local features [10] and the HMM based method [18] recognized actions from low-resolution videos (i.e. UT-Tower dataset) successfully. We compare their results in this paper
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
Multicamera Action Recognition with Canonical Correlation Analysis and Discriminative Sequence Classification
Proceedings of: 4th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2011, La Palma, Canary Islands, Spain, May 30 - June 3, 2011.This paper presents a feature fusion approach to the recognition of human actions from multiple cameras that avoids the computation of the 3D visual hull. Action descriptors are extracted for each one of the camera views available and projected into a common subspace that maximizes the correlation between each one of the components of the projections. That common subspace is learned using Probabilistic Canonical Correlation Analysis. The action classification is made in that subspace using a discriminative classifier. Results of the proposed method are shown for the classification of the IXMAS dataset.Publicad
View-invariant action recognition
Human action recognition is an important problem in computer vision. It has a
wide range of applications in surveillance, human-computer interaction,
augmented reality, video indexing, and retrieval. The varying pattern of
spatio-temporal appearance generated by human action is key for identifying the
performed action. We have seen a lot of research exploring this dynamics of
spatio-temporal appearance for learning a visual representation of human
actions. However, most of the research in action recognition is focused on some
common viewpoints, and these approaches do not perform well when there is a
change in viewpoint. Human actions are performed in a 3-dimensional environment
and are projected to a 2-dimensional space when captured as a video from a
given viewpoint. Therefore, an action will have a different spatio-temporal
appearance from different viewpoints. The research in view-invariant action
recognition addresses this problem and focuses on recognizing human actions
from unseen viewpoints
- …